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1. INTRODUCTION 

The brain tumor is one of the highly critical and serious disorders. A brain tumor occurs when 
unchecked, unregulated cell proliferation occurs in the brain. On the other hand, meningioma, glioma, and 
pituitary tumors are frequent brain tumors. Identifying, categorizing, and analyzing brain cancers early on is 
essential to treat the tumor effectively. The benign tumors most frequently found in the thin crusts protecting 
the brain and spinal cord are meningiomas. A high-grade glioma, in contrast, is an aggressive brain tumor 
with a two-year survival rate. Pituitary tumors are the result of the brain cells' atypical proliferation. The 
pituitary gland of the brain is where pituitary tumors grow. When it comes to deaths involving tumor in the 
central nervous system, brain tumors ranks 10" in case of most frequent reasons of death in both women and 
men [1]. Reports estimate that, in case of brain tumor development of all the cancer types in the world 40% 
of them are caused by metastasis rather than death [2]. In an effort to raise public awareness and educate the 
public about tumors concerning the brain, the 8 of June was declared world brain tumor day in 2000 [3]. In 
the brain, a brain tumor occurs when abnormal cells grow unnecessarily. Corresponding to the World Health 
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Organization (WHO), brain tumors can be classified into four groups depending on their molecular 
characteristics and histopathology in 2016 [3], [4]. Patients with advanced brain cancer have an extremely 
poor chance of survival [5]. 

As a result, accurate and timely grading and diagnosis of cancer improve prognosis and treatment 
options. It is possible to reduce mortality from brain tumors if it is perceived and treated at an initial phase. 
Tumor grade and diagnosis are determined by neurological examinations, imaging, and biopsies [3], [6]. 
Before and after treatment, doctors use magnetic resonance imaging (MRI) to determine the tumor's shape. 
So, when the condition gets worse, surgical resections can be scheduled and followed [7]. A successful 
prognosis depends on the early classification of brain tumor grade [6]. Anticipated to its non-invasive 
contrast enhancement nature, MRI is the preferred imagery process for diagnosing gliomas [7]. Radiologists 
use the conventional method to diagnose tumors, which is inefficient and labour-intensive. In computer aided 
medical diagnosis (CAMD), AI and deep learning have made great steps, enabling medical picture 
interpretation by doctors in a few seconds [8]. The effectiveness of deep learning is greatly influenced by the 
amount and quality of a dataset. Highly enhanced annotations are needed for images while using deep 
learning techniques. 

The challenge of cataloguing enormous amounts of medical images is that it is both time-and 
expertise-intensive [9]. The lack of expert annotations and image data has hampered deep learning for 
medical imagery [9]. The above-mentioned difficulties have been addressed in a number of different ways. 
When there are limited domain samples to train on, a transfer learning approach may be advantageous. A 
pre-trained network is usually refined using large, labelled datasets. System junction speed is increased while 
computational complexity is decreased by applying learned information to the target dataset [10]. This study 
aims to identify and categorize brain tumors hooked on glioma, meningioma, no tumor, and pituitary at an 
early stage, thereby reducing the danger of death by supplementary experts in more effective as well as 
efficient medication. It is crucial to remove noise and artefacts to accomplish excellent execution from a 
convolutional neural network (CNN) model. Furthermore, interpretation may be challenging due to the 
similarities between tumor-affected areas and impermeable brain tissue. In order to improve the visibility 
tumorous lesions contrast and brightness levels of MRI images need to be balanced. This study uses a fully 
automated and trustworthy deep learning model, the fine-tuned and hyper-tuned VGG16 built on ablation 
study and transfer learning, to predict brain tumors in MRI images. 

Seere and Karibasappa [11] provided an approach to distinguish between diseased and normal 
tissues of the brain. The system proposed a segmentation according to thresholds and watersheds; after that, 
using an SVM classifier, it achieved an accuracy of classification of 85.32% overall [11]. The created model 
effectively discriminated between diseased and normal brain slices using a method known as the k-fold-cross 
with an overall classification accuracy of 92.14% [12]. 

An algorithm for classifying brain tumors was proposed by Ullah et al. [13] and his research 
associates, based on brain MRIs obtained from Radiology Department of Bahawal Victoria Hospital 
(RD-BVH). Extracting the intensity, texture features and shape from brain MRI slices led to an accuracy of 
97%. Anaraki et al. [14] introduced a method for classifying brain tumors using CNNs and genetic 
algorithms. As proposed by Biswas and Islam [15], the suggested technique for building networks, known as 
“Levenberg-Marquardt,” provides 97.83% specificity, 94.58% sensitivity, and 95.4% accuracy. MRI images 
can be used to identify and classify brain cancers using a faster CNN that is region based developed by 
Avşar and Salçin [16]. The accurate prediction of the model was 91.66%. 

There is also a method in [17] for classifying MRI brain cancer based on grayscale, symmetry, and 
texture features. Three optimizers, namely ADAM, SGDM, and RMSprop, are suggested by 
Precious et al. [18], from which detection rate of 98.1%, 92.5%, and 83.0% is acquired. To represent model 
experts, Papageorgiou et al. [19] developed the fuzzy cognitive map (FCM). The addition of an activation 
Hebbian methodology enhanced the classification abilities of the FCM ranking method. A hundred examples 
and medical resources were used to validate the suggested technique. A wavelet transform of two-dimension 
was used by Schmeelk [20] to work with images having 2 dimensions. The two transform techniques were 
applied on divided elements were thoroughly compared by the authors. 


2. METHOD 

To find the best transfer learning model for categorization, this study analyzed five models, transfer 
learning model: MobileNetV2, InceptionV3, VGG16, MobileNet, and VGG19, there are a total of five 
pre-trained networks which are developed on training examples and testing data. The diagram in Figure 1 
represents a process of preparing and analyzing a brain tumor MRI image dataset. The first step involves 
processing the dataset with various techniques to improve the visual quality. This includes the removal of 
speckle noise using a median filter, the removal of artifacts with morphological closing, and brightness 
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adjustment using contrast limited adaptive histogram equalization (CLAHE). The next step involves 
balancing the dataset using data augmentation techniques, which helps in addressing the problem of class 
imbalance. The final step is to train and evaluate transfer learning algorithms using the processed dataset, out 
of which VGG16 performs the best. This model is then finely tuned using an ablation study to get the 
maximum performance. Finally, the performance of the finely tuned transfer learning model is analyzed to 
evaluate its effectiveness. This process helps in understanding the strengths and weaknesses of the model and 
identifying areas for further improvement. 
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Figure 1. Workflow of the entire classification 


2.1. Dataset description and training approach 

A total of 3264 MRI scans from the brain tumor MRI dataset were examined for this study. A total 
of four classes makes up the dataset, pituitary, meningioma, glioma, and no tumor. The class of pituitary 
contains 951 images, meningioma contains 937 images, glioma holds 926 images, and no tumor class has the 
lowest of images which is 500. The grayscale system for each image in the datasets is 224x224 pixels. The 
dataset has been collected from openly accessible website Kaggle. Three different splitting ratios are 
commonly used (90:10, 80:20, and 70:30). In a study conducted recently it was found that 20% of data was 
used for testing the final outcome [21]. The dataset has been collected from openly accessible website 
Kaggle. The maximum number of epochs for training the models is 100, with a batch size of 16. The best 
model's weights were saved during training using Keras' “callback” function relying on a minimum loss 
value. At a learning rate of 0.001, Adam has been employed for optimization. For multiclass situations, 
categorical cross-entropy is the default loss function [22]. 'SoftMax' activation is used to predict the 
likelihood for individual class. SoftMax always has an aggregate of 1, as they normalize all values ranging 
from 0 and 1. 


2.2. Image pre-processing 

The images from the dataset are full of artifacts and noises. Consequently, the goal of this research 
is to apply image processing to increase the model's accuracy. Since pictures are frequently damaged by 
noises and artifacts, the processing of images is the preliminary step in training deep learning models. 
Morphological closing is utilized first to get rid of artifacts from these images, and then a median filter is 
applied for noise removal. 

The image in Figure 2 depicts a brain tumor MRI dataset that is undergoing several image 
preprocessing techniques. The first technique being applied is a median filter, which is used to remove 
speckle noise from the images. This is followed by the use of morphological closing, which is used to remove 
artifacts from the images [23]. The image is then upgraded using CLAHE, which improves the image’s 
brightness and sharpness. 

This technique helps to make the features of the brain tumor more visible and distinguishable [24]. 
The enhancement of local contrast enhances the legibility of medical images [25]. To enhance the quality of 
brain tumor MRI images and prepare them for analysis by machine learning algorithms, certain 
preprocessing approaches must be applied. The resulting images will have reduced noise, artifacts, and 
enhanced features that will aid in spotting the precise position of brain tumors in the processed imaginings. 
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Figure 2. Applied image pre-processing techniques 


2.3. Verification 

Numerous methods of numerical evaluation, including mean square error (MSE), structured 
similarity index method (SSIM), and peak signal to noise ratio (PSNR), MSE is unquestionably the most 
fundamental and frequently utilized error term. The difference between the image's truncated and raw 
versions is shown as the squared cumulative error. The relationship between the error and MSE value is 
inverse. The PSNR which determines how well an image is compressed or reconstructed, rises. According to 
the SSIM, preprocessing algorithms lower image quality. 


2.4. Ablation study 

In CNN-based applications, different hyperparameters and layers are altered or removed to evaluate 
the model’s performance and stability. In this study, hyperparameter ablation is used to generate strong and 
well-tuned networks. In this research, there are 5 case study has experimented on the MRI-augmented 
dataset. 


3. RESULTS AND DISCUSSION 
3.1. Result of transfer learning models 

Table 1 illustrates the outcome of five transfer learning models for a particular task. The table 
presents six metrics for each model, including test accuracy, validation (Val) accuracy, train accuracy, train 
loss, test loss, and val loss. The five transfer learning models presented in the table are VGG-16, VGG-19, 
MobileNet, MobileNet V2, and InceptionV3. From the table, we can see that VGG-16 has the highest train 
accuracy (97.77%), test accuracy (96.13%), and val accuracy (96.83%) among the five models, indicating 
that it performs the best on the given task. On the other hand, InceptionV3 has the lowest train accuracy 
(78.76%), test accuracy (77.81%), and val accuracy (77.86%), suggesting that it performs the worst among 
the five models. 


Table 1. Results of five transfer learning model 
Val accuracy 


Train accuracy 


Model Test accuracy (%) Train loss Test loss Val loss 


(%) (%) 
VGG-16 97.77 96.13 96.83 0.18 0.19 0.12 
VGG-19 97.45 96.07 96.64 0.21 0.23 0.23 
MobileNet 95.98 95.23 95.23 0.17 0.29 0.28 
MobileNet V2 96.21 95.78 95.78 0.21 0.31 0.32 
Inception V3 78.76 77.81 77.86 0.4 0.4 0.41 
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3.2. Result of ablation study 

Modifying certain design elements can enhance classification accuracy and improve overall 
reliability. To explore these improvements, five ablation investigations were conducted, where different 
components were modified in the VGG16 model. These alterations were aimed at creating a finely tuned 
model with enhanced performance. 


3.2.1. Case study 1: flatten layer alterations 

In Table 2 it has been demonstrated that using the flattened layer yields the best accuracy. 
Furthermore, pooling methods like global maximum and average do not offer better performance. While 
global average pooling and global maximum works 95.19% and 95.22% precision, accordingly, flattening the 
layer yields 96.13% accuracy. 


Table 2. Altering flatten layers 


Case study 01 
Configuration no. Flatten layer types Epochs x training times (s) Test accuracy (%) Findings 
1 Flatten 97x5 96.13 Maximum accuracy 
2 Global max pooling 61x4 95.22 Modest accuracy 
3 Global average pooling 67x5 95.19 Modest accuracy 


3.3.2. Case study 2: changing the batch size 

Table 3 shows the results of a case study on the effect of batch size on the test accuracy of a 
machine learning model. The table presents four configurations with different batch sizes, epochs, training 
times, and test accuracies. Configuration no. 2, provides the maximum accuracy of 96.93%, where the batch 
size is 32 and the model is trained for 43 epochs with a training time of 4 seconds. The table suggests that 
choosing the optimal batch size is crucial for achieving the highest accuracy. 


Table 3. Altering the batch size 


Case study 02 
Configuration no. Batch size Epochs x training times (s) Test accuracy (%) Finding 
1 16 97x5 96.13 Modest accuracy 
2 32 43x4 96.93 Highest accuracy 
3 64 82x5 93.92 Modest accuracy 
4 128 27x5 93.45 Modest accuracy 


3.2.3. Case study 3: changing learning rate 

Table 4 illustrates the results using different learning rates on increasing the model’s accuracy. The 
highest accuracy of 99.21% is achieved in configuration no. 2, where the model is trained with a learning rate 
of 0.001 for 97 epochs with a training time of 5 seconds. In this case, a learning rate of 0.001 resulted in the 
highest accuracy, while other learning rates resulted in accuracy drops or improvement. 


Table 4. Altering learning rates 


Case study 03 
Configuration no. Learning rates ___ Epochs x training times (s) _ Test accuracy (%) Findings 
1 0.01 92x55 98.41 Accuracy dropped 
2 0.001 97x5 99.21 Highest accuracy 
3 0.0001 68x57 98.32 Accuracy improved 


3.2.4. Case study 4: changing the loss function 

The findings of a case study on the impact of various loss functions on a transfer learning model's 
test accuracy are presented in Table 5. The table presents five configurations with different loss functions, 
epochs, training times, and test accuracies. The highest test accuracy of 96.93% is achieved in configuration 
no. 2, where the model is trained with the categorical cross-entropy loss function for 97 epochs with a 
training time of 5 seconds. In this case, the categorical cross-entropy loss function obtained the highest 
performance, while other loss functions resulted in accuracy drops or errors. 
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Table 5. Altering the loss function 


Case study 04 
Configuration no. Loss functions Epochs x training times (s) _ Test accuracy (%) Findings 
1 Binary crossentropys Error Error Error 
2 Categorical crossentropys 97x5 96.93 Maximum accuracy 
3 Mean squared errors 97x5 96.79 Modest accuracy 
4 Mean absolute errors 49x4 69.46 Low accuracy 
5 Mean squared logarithmic error 46x5 97.78 Modest accuracy 


3.2.5. Case study 5: changing optimizers 

Table 6 presents the results of utilizing different optimizers on the test accuracy of the VGG-16 
model. The Adam optimizer achieved the highest accuracy of 98.41% in configuration no. 1, where the 
model is trained for 97 epochs with a training time of 5 seconds. In this case, the Adam optimizer 
outperformed the other optimizers, including Nadam, SGD, and Adamax, which resulted in accuracy drops. 


Table 6. Altering optimizers 


Case study 05 
Configuration no. Optimizers _ Epochs x training times (s) Test accuracy (%) Findings 
1 Adam 97x5 98.41 Maximum accuracy 
2 Nadam 44x5 96.93 Previous dropped 
3 SGD 89x5 86.22 Modest accuracy 
4 Adamax 75x5 91.59 Modest accuracy 


3.3. Performance analysis of best model 

The finely tuned VGG-16 model after ablation study was performed achieved a test accuracy of 
99.21%. 224x224-pixel images were used while model training, utilizing the optimizer Adam with a batch 
size of 32 and a learn rate of 0.001 for 90 epochs. The model utilized softmax activation function, and a 
dropout rate of 0.5, along with a momentum of 0.9. The table suggests that the configuration of the model, 
including the choice of optimizer, activation function, dropout rate, and other hyperparameters, can 
significantly affect how accurate the model is. 


3.4. Performance analysis and statistical analysis 

Table 7 presents statistics and performance evaluation of a machine learning model. The model had 
a 99.21% accuracy rate. Other evaluation metrics such as false negative rate (FNR), false positive rate (FPR), 
false discovery rate (FDR), Matthew’s correlation coefficient (MCC), and Kappa coefficient (KC) are also 
provided. The precision, recall, specificity, and F1 score of the model are also shown in the table. The values 
in the table indicate that the model has high accuracy and performs well on most evaluation metrics 


Table 7. Performance evaluation 
Accuracy FPR (%) FDR(%) FNR(%) KC(%) MCC(%) _Precession Recall Specifity F1 score 
99.21 1.55 2.56 2.41 99.04 2.23 97.65 89.108 96.124 96.89 


4. CONCLUSION 

Substantial, annotated training datasets are required for deep learning systems used in medical 
imaging to identify tumors. A radiology subspecialty often involves manually annotating images. The 
advancement of AI in healthcare imaging is hampered by prohibitively expensive charges. Expertise and time 
are also valuable as the AI field tends to develop very fast. In order to build a competitive classification 
system with low annotation expense, transfer learning techniques have now been created. Models can 
recognize and categorize new data using the knowledge they have gathered from large datasets thanks to the 
transfer learning technique. Using a transfer learning model, this study proposes a system for categorizing 
brain tumor MRI images more accurately, thereby reducing death rates. In this experimentation, artefacts, 
and noise are removed from the image using various preprocessing techniques. We experimented with five 
transfer learning models using the brain tumor MRI dataset. The proposed model attained the best accuracy 
since the hyperparameters were tuned properly. In the near future, the efficacy of the suggested model can be 
evaluated using real-time medical data with expanded quantities of unprocessed medical photos. However, 
this research's suggested model accurately categorizes the four kinds of brain tumors in most tests. Despite a 
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few minor drawbacks, it is possible to guarantee that the proposed well-tuned VGG16 model is precise and 
enhanced across all diagnosis areas. 
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